fix(core): use (type, id) keys in vector search hydration to prevent id collisions by phernandez · Pull Request #986 · basicmachines-co/basic-memory

phernandez · 2026-06-12T13:19:56Z

Root cause

Entity, observation, and relation rows in search_index carry ids from independent auto-increment sequences, so rows of different types routinely share the same numeric id — guaranteed in young databases. In src/basic_memory/repository/search_repository_base.py:

_search_vector_only parsed each vector hit's chunk_key (e.g. entity:4:0) but discarded the type, keying similarity_by_si_id / chunks_by_si_id by bare id, which also collapsed colliding hits into one map slot.
_fetch_search_index_rows_by_ids fetched WHERE id IN (...) and keyed its result dict by bare row.id — whichever row the database returned last clobbered the other, so the clobbered hit hydrated against the wrong row or was silently dropped (e.g. an entity vanishing entirely under a search_item_types filter when a relation shared its id).
_search_hybrid fused FTS and vector results on bare row id, merging unrelated rows of different types and granting them a spurious dual-source fusion bonus.

The FTS-filter branch already guarded exactly this with (id, type) tuples — the primary vector lookup path and hybrid fusion missed the same treatment.

Fix

Introduce a SearchIndexKey = tuple[str, int] type alias and key every map in the vector/hybrid retrieval path by (type, id): the similarity and chunk maps in _search_vector_only, the _fetch_search_index_rows_by_ids result, the FTS-filter allowed-keys set, and the rows/fts/vec/fused score maps in _search_hybrid. SQL stays unchanged; bare ids are deduplicated before the IN query and rows are discriminated by row.type when building dict keys. The fix lives in the shared base class, so both SQLite and Postgres backends are covered.

Test evidence

test_sqlite_vector_search_survives_cross_type_id_collision (real SQLite + sqlite-vec harness): indexes an entity row and a relation row sharing id=7, syncs vectors for both, asserts vector retrieval returns both rows with correct types, and that a search_item_types=[ENTITY] filter still returns the entity.
test_cross_type_id_collision_keeps_both_results (hybrid fusion): an FTS entity and a vector relation sharing id=1 stay distinct with their single-source scores — no cross-type merge or fusion bonus.
Both tests fail on main without the src change (verified by stashing the fix: SQLite test returns 1 merged result instead of 2; hybrid test fuses the rows into one) and pass with it.
Existing mocked vector tests (test_vector_threshold.py, test_vector_pagination.py) updated for tuple-keyed fetch results.
415 passed / 19 skipped across tests/repository/, search service, semantic search, and search schema suites; ruff and ty clean on src tests test-int.

Fixes #982

🤖 Generated with Claude Code

…id collisions Root cause: entity, observation, and relation rows in search_index carry ids from independent auto-increment sequences, so rows of different types routinely share the same numeric id (guaranteed in young databases). _search_vector_only parsed each vector hit's chunk_key (e.g. 'entity:4:0') but discarded the type, and _fetch_search_index_rows_by_ids keyed its result dict by bare row.id with no type discrimination. Whichever row the database returned last clobbered the other in the dict; the clobbered hit then hydrated against the wrong row or found None and was silently dropped from results. The FTS-filter branch already guarded this with (id, type) tuples, but the primary vector lookup path and the hybrid fusion maps missed the same treatment. Fix: introduce a SearchIndexKey = tuple[str, int] alias and key every map in the vector/hybrid retrieval path by (type, id) — the similarity and chunk maps in _search_vector_only, the _fetch_search_index_rows_by_ids result, the FTS-filter allowed keys, and the rows/fts/vec/fused score maps in _search_hybrid. The SQL stays unchanged; bare ids are deduped before the IN query and rows are discriminated by row.type when building dict keys. Tests: end-to-end SQLite regression test indexes an entity row and a relation row sharing id 7, syncs vectors for both, and asserts vector search returns both rows (and that the entity survives a search_item_types filter); a hybrid fusion unit test asserts an entity and relation sharing id 1 stay distinct with single-source scores. Both fail without the fix. Existing mocked vector tests updated for tuple keys. Fixes #982 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

phernandez merged commit 253e240 into main Jun 12, 2026
23 checks passed

phernandez deleted the fix/982-vector-search-id-collision branch June 12, 2026 14:03

groksrc mentioned this pull request Jun 12, 2026

fix(mcp): keep search index type in vector hydration #984

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): use (type, id) keys in vector search hydration to prevent id collisions#986

fix(core): use (type, id) keys in vector search hydration to prevent id collisions#986
phernandez merged 1 commit into
mainfrom
fix/982-vector-search-id-collision

phernandez commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Jun 12, 2026

Root cause

Fix

Test evidence

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant